Building Taxonomies: Data Models to Remove Ambiguity from AI and Search | S2 E8

Update: 2024-10-04

Description

Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies.

That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs.

Everyone is talking about them no-one knows how to build them.

But before we look into that, what are they good for in search?

Imagine a large corpus of academic papers. When a user searches for "machine learning in healthcare", the system can:

Recognize "machine learning" as a subcategory of "artificial intelligence"
Identify "healthcare" as a broad field with subfields like "diagnostics" and "patient care"
We can use these to expand the query or narrow it down.
We can return results that include papers on "neural networks for medical imaging" or "predictive analytics in patient outcomes", even if these exact phrases weren't in the search query
We can also filter down and remove papers not tagged with AI that might just mention it in a side not.

So we are building the plumbing, the necessary infrastructure for tagging, categorization, query expansion and relexation, filtering.

So how can we build them?

1️⃣ Start with Industry Standards • Leverage established taxonomies (e.g., Google, GS1, IAB) • Audit them for relevance to your project • Use as a foundation, not a final solution

2️⃣ Customize and Fill Gaps • Adapt industry taxonomies to your specific domain • Create a "coverage model" for your unique needs • Mine internal docs to identify domain-specific concepts

3️⃣ Follow Ontology Best Practices • Use clear, unique primary labels for each concept • Include definitions to avoid ambiguity • Provide context for each taxonomy node

Jessica Talisman:

Nicolay Gerold:

00:00 Introduction to Taxonomies and Knowledge Graphs 02:03 Building the Foundation: Metadata to Knowledge Graphs 04:35 Industry Taxonomies and Coverage Models 06:32 Clustering and Labeling Techniques 11:00 Evaluating and Maintaining Taxonomies 31:41 Exploring Taxonomy Granularity 32:18 Differentiating Taxonomies for Experts and Users 33:35 Mapping and Equivalency in Taxonomies 34:02 Best Practices and Examples of Taxonomies 40:50 Building Multilingual Taxonomies 44:33 Creative Applications of Taxonomies 48:54 Overrated and Underappreciated Technologies 53:00 The Importance of Human Involvement in AI 53:57 Connecting with the Speaker 55:05 Final Thoughts and Takeaways